30 research outputs found
IsoBN: Fine-Tuning BERT with Isotropic Batch Normalization
Fine-tuning pre-trained language models (PTLMs), such as BERT and its better
variant RoBERTa, has been a common practice for advancing performance in
natural language understanding (NLU) tasks. Recent advance in representation
learning shows that isotropic (i.e., unit-variance and uncorrelated) embeddings
can significantly improve performance on downstream tasks with faster
convergence and better generalization. The isotropy of the pre-trained
embeddings in PTLMs, however, is relatively under-explored. In this paper, we
analyze the isotropy of the pre-trained [CLS] embeddings of PTLMs with
straightforward visualization, and point out two major issues: high variance in
their standard deviation, and high correlation between different dimensions. We
also propose a new network regularization method, isotropic batch normalization
(IsoBN) to address the issues, towards learning more isotropic representations
in fine-tuning by dynamically penalizing dominating principal components. This
simple yet effective fine-tuning method yields about 1.0 absolute increment on
the average of seven NLU tasks.Comment: AAAI 202
Automatic Extraction of Commonsense LocatedNear Knowledge
LocatedNear relation is a kind of commonsense knowledge describing two
physical objects that are typically found near each other in real life. In this
paper, we study how to automatically extract such relationship through a
sentence-level relation classifier and aggregating the scores of entity pairs
from a large corpus. Also, we release two benchmark datasets for evaluation and
future research.Comment: Accepted by ACL 2018. A preliminary version is presented on
AKBC@NIPS'1
LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition
Low-rank adaptations (LoRA) are often employed to fine-tune large language
models (LLMs) for new tasks. This paper investigates LoRA composability for
cross-task generalization and introduces LoraHub, a strategic framework devised
for the purposive assembly of LoRA modules trained on diverse given tasks, with
the objective of achieving adaptable performance on unseen tasks. With just a
few examples from a novel task, LoraHub enables the fluid combination of
multiple LoRA modules, eradicating the need for human expertise. Notably, the
composition requires neither additional model parameters nor gradients. Our
empirical results, derived from the Big-Bench Hard (BBH) benchmark, suggest
that LoraHub can effectively mimic the performance of in-context learning in
few-shot scenarios, excluding the necessity of in-context examples alongside
each inference input. A significant contribution of our research is the
fostering of a community for LoRA, where users can share their trained LoRA
modules, thereby facilitating their application to new tasks. We anticipate
this resource will widen access to and spur advancements in general
intelligence as well as LLMs in production. Code will be available at
https://github.com/sail-sg/lorahub.Comment: Work in progress. The first three authors contributed equally to this
wor
Unsupervised Cross-Task Generalization via Retrieval Augmentation
Humans can perform unseen tasks by recalling relevant skills that are
acquired previously and then generalizing them to the target tasks, even if
there is no supervision at all. In this paper, we aim to improve such
cross-task generalization ability of massive multi-task language models such as
T0 (Sanh et al., 2021) in an unsupervised setting. We propose a
retrieval-augmentation method named ReCross that takes a few unlabelled
examples as queries to retrieve a small subset of upstream data and uses them
to update the multi-task model for better generalization. Our empirical results
show that the proposed ReCross consistently outperforms non-retrieval baselines
by a significant margin.Comment: Project website: https://inklab.usc.edu/ReCross